Add StructArray and RunArray benchmark tests to with_hashes#20182
Add StructArray and RunArray benchmark tests to with_hashes#20182notashes wants to merge 5 commits intoapache:mainfrom
StructArray and RunArray benchmark tests to with_hashes#20182Conversation
| do_hash_test(b, &arrays); | ||
| }); | ||
|
|
||
| // Union arrays can't have null bitmasks |
There was a problem hiding this comment.
Mentioning union array when we don't implement that here?
There was a problem hiding this comment.
I've copied that from the other PR verbatim 😅 (to not have merge conflicts in the future?). but I'm getting a sense that it's the wrong approach here!
| .clone() | ||
| .into_data() | ||
| .into_builder() | ||
| .nulls(Some(create_null_mask(values.len()))) |
There was a problem hiding this comment.
Something to think about is how null density acts differently here for run arrays, since we'd apply null on entire runs 🤔
There was a problem hiding this comment.
I was thinking about it for a while. It probably should come up to be around the same 3% zone even though the variance could be a bit high.
I've set the run_length to be within 1..50.
Let's say we have ~300 runs on average, with each each one carrying ~25 elements. 3% of which will roughly translate to 10 * 25 = 250. But yes that is probably our ideal scenario.
let me know what you think? i'll try to do some testing regarding this.
| ) | ||
| } | ||
|
|
||
| fn string_array(array_len: usize) -> ArrayRef { |
There was a problem hiding this comment.
Do we need this if we already have StringPool above?
There was a problem hiding this comment.
done! don't think a different one offers any benefit! both seem to give me close to 10% speed up locally (with the struct_array optimization)
Which issue does this PR close?
StructArrayandRunArraybenchmarks towith_hashessuite in datafusion-common #20181Rationale for this change
Issue #20152 shows some areas of optimization for
RunArrayandStructArrayhashing. But the existingwith_hashesbenchmark tests don't include coverage for these!What changes are included in this PR?
Added benchmarks to
with_hashes.rs:Are these changes tested?
No additional tests added, but the benchmarks both compile and run.
a sample run:
Are there any user-facing changes?